User equipment is one of the main bottlenecks facing the gaming industry nowadays. The extremely realistic games which are currently available trigger high computational requirements of the user devices to run games. As a consequence, the game industry has proposed the concept of Cloud Gaming, a paradigm that improves gaming experience in reduced hardware devices. To this end, games are hosted on remote servers, relegating users' devices to play only the role of a peripheral for interacting with the game. However, this paradigm overloads the communication links connecting the users with the cloud. Therefore, service experience becomes highly dependent on network connectivity. To overcome this, Cloud Gaming will be boosted by the promised performance of 5G and future 6G networks, together with the flexibility provided by mobility in multi-RAT scenarios, such as WiFi. In this scope, the present work proposes a framework for measuring and estimating the main E2E metrics of the Cloud Gaming service, namely KQIs. In addition, different machine learning techniques are assessed for predicting KQIs related to Cloud Gaming user's experience. To this end, the main key quality indicators (KQIs) of the service such as input lag, freeze percent or perceived video frame rate are collected in a real environment. Based on these, results show that machine learning techniques provide a good estimation of these indicators solely from network-based metrics. This is considered a valuable asset to guide the delivery of Cloud Gaming services through cellular communications networks even without access to the user's device, as it is expected for telecom operators.
translated by 谷歌翻译
The task of motion forecasting is critical for self-driving vehicles (SDVs) to be able to plan a safe maneuver. Towards this goal, modern approaches reason about the map, the agents' past trajectories and their interactions in order to produce accurate forecasts. The predominant approach has been to encode the map and other agents in the reference frame of each target agent. However, this approach is computationally expensive for multi-agent prediction as inference needs to be run for each agent. To tackle the scaling challenge, the solution thus far has been to encode all agents and the map in a shared coordinate frame (e.g., the SDV frame). However, this is sample inefficient and vulnerable to domain shift (e.g., when the SDV visits uncommon states). In contrast, in this paper, we propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization. Towards this goal, we leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph. This parameterization allows us to be invariant to scene viewpoint, and save online computation by re-using map embeddings computed offline. Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction. We demonstrate the effectiveness of our approach on the urban Argoverse 2 benchmark as well as a novel highway dataset.
translated by 谷歌翻译
Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - class frequency, ranking and entropy.
translated by 谷歌翻译
从极端视图图像中恢复相机的空间布局和场景的几何形状是计算机视觉的长期挑战。盛行的3D重建算法通常采用匹配范式的图像,并假定场景的一部分是可以在图像上进行的,当输入之间几乎没有重叠时的性能较差。相比之下,人类可以通过形状的先验知识将一个图像中的可见部分与另一个图像中相应的不可见组件相关联。受这个事实的启发,我们提出了一个名为虚拟通信(VC)的新颖概念。 VC是来自两个图像的一对像素,它们的相机射线在3D中相交。与经典的对应关系相似,VC符合异性几何形状;与经典的信件不同,VC不需要在视图上可以共同提供。因此,即使图像不重叠,也可以建立和利用VC。我们介绍了一种方法,可以在场景中找到基于人类的虚拟对应关系。我们展示了如何与经典捆绑捆绑调整无缝集成的风险投资,以恢复跨极视图的相机姿势。实验表明,在具有挑战性的情况下,我们的方法显着优于最先进的摄像头姿势估计方法,并且在传统的密集捕获的设置中是可比的。我们的方法还释放了多个下游任务的潜力,例如在极端视图场景中从多视图立体声和新型视图合成中进行场景重建。
translated by 谷歌翻译
这项工作提出了M3E2,一种多任务学习神经网络模型来估计多种治疗的效果。与现有方法相比,M3E2对于同时应用于同一单元,连续和二元处理以及许多协变量的多种治疗效果是鲁棒的。我们将M3E2与三个基准数据集中的三个基线进行比较:两个具有多种治疗和一个待遇。我们的分析表明,我们的方法具有卓越的性能,制作了对真实治疗效果的更大的自信估计。代码可在github.com/raquelaoki/m3e2上获得。
translated by 谷歌翻译
我们研究了现代神经语言模型容易受到结构启动的程度,这种现象使句子的结构在后续句子中更有可能使相同的结构更有可能。我们探索如何使用启动来研究这些模型学习抽象结构信息的潜力,这是需要自然语言理解技能的任务良好表现的先决条件。我们引入了一种新型的度量标准和释放Prime-LM,这是一个大型语料库,我们可以控制与启动强度相互作用的各种语言因素。我们发现,变压器模型确实显示了结构启动的证据,但他们所学到的概括在某种程度上是由语义信息调节的。我们的实验还表明,模型获得的表示不仅可以编码抽象的顺序结构,而且还涉及一定级别的层次句法信息。更普遍的是,我们的研究表明,启动范式是一种有用的,可用于洞悉语言模型能力的有用的,并为未来的基于底漆的调查打开了探测模型内部状态的未来大门。
translated by 谷歌翻译
从单个视图中重建高质量的3D对象,从单个视图中的部分观测可能对计算机视觉,机器人和图形的各种应用来说至关重要。虽然最近的神经隐式建模方法显示了合成或密集数据的有希望的结果,但它们在稀疏和嘈杂的现实世界数据上表现不佳。我们发现流行的神经隐式模型的局限性是由于缺乏鲁棒形状的主管和缺乏适当的正则化。在这项工作中,我们展示了使用:(i)一个深度编码器作为形状潜在代码的鲁棒初始化器的深度编码器; (ii)正规化的测试时间优化潜在代码; (iii)以学习的高维形状为深度鉴别者; (iv)一种新颖的课程学习策略,允许模型学习合成数据的形状前瞻,并将其平稳地将它们转移到稀疏的现实世界数据。我们的方法更好地捕获了全局结构,在遮挡和稀疏观测上表现良好,并用地面真理形状良好寄存。我们在两个现实世界数据集上展示了最先进的3D对象重建方法的卓越性能。
translated by 谷歌翻译
已经证明了现代自动驾驶感知系统在处理互补输入之类的利用图像时,已被证明可以改善互补投入。在孤立中,已发现2D图像非常容易受到对抗性攻击的影响。然而,有有限的研究与图像特征融合的多模态模型的对抗鲁棒性。此外,现有的作品不考虑跨输入方式一致的物理上可实现的扰动。在本文中,我们通过将对抗物体放在主车辆的顶部上展示多传感器检测的实际敏感性。我们专注于身体上可实现的和输入 - 不可行的攻击,因为它们是在实践中执行的可行性,并且表明单个通用对手可以隐藏来自最先进的多模态探测器的不同主机。我们的实验表明,成功的攻击主要是由易于损坏的图像特征引起的。此外,我们发现,在将图像特征中的现代传感器融合方法中,对抗攻击可以利用投影过程来在3D中跨越区域产生误报。朝着更强大的多模态感知系统,我们表明,具有特征剥夺的对抗训练可以显着提高对这种攻击的鲁棒性。然而,我们发现标准的对抗性防御仍然努力防止由3D LIDAR点和2D像素之间不准确的关联引起的误报。
translated by 谷歌翻译
Standard convolutional neural networks assume a grid structured input is available and exploit discrete convolutions as their fundamental building blocks. This limits their applicability to many real-world applications. In this paper we propose Parametric Continuous Convolution, a new learnable operator that operates over non-grid structured data. The key idea is to exploit parameterized kernel functions that span the full continuous vector space. This generalization allows us to learn over arbitrary data structures as long as their support relationship is computable. Our experiments show significant improvement over the state-ofthe-art in point cloud segmentation of indoor and outdoor scenes, and lidar motion estimation of driving scenes.
translated by 谷歌翻译
由于自动驾驶系统变得更好,模拟自动堆栈可能失败的方案变得更加重要。传统上,这些方案对于一些关于将地理演奏器状态作为输入的规划模块而产生的一些场景。这不会缩放,无法识别所有可能的自主义故障,例如由于遮挡引起的感知故障。在本文中,我们提出了对基于LIDAR的自治系统产生了安全性临界情景的促进框架。鉴于初始交通方案,Advsim以物理卓越的方式修改演员的轨迹,并更新LIDAR传感器数据以匹配扰动的世界。重要的是,通过直接模拟传感器数据,我们获得对完整自主堆栈的安全关键的对抗方案。我们的实验表明,我们的方法是一般的,可以识别成千上万的语义有意义的安全关键方案,适用于各种现代自动驾驶系统。此外,我们表明,通过使用Advsim产生的情景训练,可以进一步改善这些系统的稳健性和安全性。
translated by 谷歌翻译